54 research outputs found
Pure Exploration with Multiple Correct Answers
We determine the sample complexity of pure exploration bandit problems with
multiple good answers. We derive a lower bound using a new game equilibrium
argument. We show how continuity and convexity properties of single-answer
problems ensures that the Track-and-Stop algorithm has asymptotically optimal
sample complexity. However, that convexity is lost when going to the
multiple-answer setting. We present a new algorithm which extends
Track-and-Stop to the multiple-answer case and has asymptotic sample complexity
matching the lower bound
Second-order Quantile Methods for Experts and Combinatorial Games
We aim to design strategies for sequential decision making that adjust to the
difficulty of the learning problem. We study this question both in the setting
of prediction with expert advice, and for more general combinatorial decision
tasks. We are not satisfied with just guaranteeing minimax regret rates, but we
want our algorithms to perform significantly better on easy data. Two popular
ways to formalize such adaptivity are second-order regret bounds and quantile
bounds. The underlying notions of 'easy data', which may be paraphrased as "the
learning problem has small variance" and "multiple decisions are useful", are
synergetic. But even though there are sophisticated algorithms that exploit one
of the two, no existing algorithm is able to adapt to both.
In this paper we outline a new method for obtaining such adaptive algorithms,
based on a potential function that aggregates a range of learning rates (which
are essential tuning parameters). By choosing the right prior we construct
efficient algorithms and show that they reap both benefits by proving the first
bounds that are both second-order and incorporate quantiles
Universal Codes from Switching Strategies
We discuss algorithms for combining sequential prediction strategies, a task
which can be viewed as a natural generalisation of the concept of universal
coding. We describe a graphical language based on Hidden Markov Models for
defining prediction strategies, and we provide both existing and new models as
examples. The models include efficient, parameterless models for switching
between the input strategies over time, including a model for the case where
switches tend to occur in clusters, and finally a new model for the scenario
where the prediction strategies have a known relationship, and where jumps are
typically between strongly related ones. This last model is relevant for coding
time series data where parameter drift is expected. As theoretical ontributions
we introduce an interpolation construction that is useful in the development
and analysis of new algorithms, and we establish a new sophisticated lemma for
analysing the individual sequence regret of parameterised models
Online Isotonic Regression
We consider the online version of the isotonic regression problem. Given a
set of linearly ordered points (e.g., on the real line), the learner must
predict labels sequentially at adversarially chosen positions and is evaluated
by her total squared loss compared against the best isotonic (non-decreasing)
function in hindsight. We survey several standard online learning algorithms
and show that none of them achieve the optimal regret exponent; in fact, most
of them (including Online Gradient Descent, Follow the Leader and Exponential
Weights) incur linear regret. We then prove that the Exponential Weights
algorithm played over a covering net of isotonic functions has a regret bounded
by and present a matching
lower bound on regret. We provide a computationally efficient version of this
algorithm. We also analyze the noise-free case, in which the revealed labels
are isotonic, and show that the bound can be improved to or even to
(when the labels are revealed in isotonic order). Finally, we extend the
analysis beyond squared loss and give bounds for entropic loss and absolute
loss.Comment: 25 page
Lipschitz Adaptivity with Multiple Learning Rates in Online Learning
We aim to design adaptive online learning algorithms that take advantage of
any special structure that might be present in the learning task at hand, with
as little manual tuning by the user as possible. A fundamental obstacle that
comes up in the design of such adaptive algorithms is to calibrate a so-called
step-size or learning rate hyperparameter depending on variance, gradient
norms, etc. A recent technique promises to overcome this difficulty by
maintaining multiple learning rates in parallel. This technique has been
applied in the MetaGrad algorithm for online convex optimization and the Squint
algorithm for prediction with expert advice. However, in both cases the user
still has to provide in advance a Lipschitz hyperparameter that bounds the norm
of the gradients. Although this hyperparameter is typically not available in
advance, tuning it correctly is crucial: if it is set too small, the methods
may fail completely; but if it is taken too large, performance deteriorates
significantly. In the present work we remove this Lipschitz hyperparameter by
designing new versions of MetaGrad and Squint that adapt to its optimal value
automatically. We achieve this by dynamically updating the set of active
learning rates. For MetaGrad, we further improve the computational efficiency
of handling constraints on the domain of prediction, and we remove the need to
specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201
Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning
We consider online learning algorithms that guarantee worst-case regret rates
in adversarial environments (so they can be deployed safely and will perform
robustly), yet adapt optimally to favorable stochastic environments (so they
will perform well in a variety of settings of practical importance). We
quantify the friendliness of stochastic environments by means of the well-known
Bernstein (a.k.a. generalized Tsybakov margin) condition. For two recent
algorithms (Squint for the Hedge setting and MetaGrad for online convex
optimization) we show that the particular form of their data-dependent
individual-sequence regret guarantees implies that they adapt automatically to
the Bernstein parameters of the stochastic environment. We prove that these
algorithms attain fast rates in their respective settings both in expectation
and with high probability
Lipschitz and Comparator-Norm Adaptivity in Online Learning
We study Online Convex Optimization in the unbounded setting where neither
predictions nor gradient are constrained. The goal is to simultaneously adapt
to both the sequence of gradients and the comparator. We first develop
parameter-free and scale-free algorithms for a simplified setting with hints.
We present two versions: the first adapts to the squared norms of both
comparator and gradients separately using time per round, the second
adapts to their squared inner products (which measure variance only in the
comparator direction) in time per round. We then generalize two prior
reductions to the unbounded setting; one to not need hints, and a second to
deal with the range ratio problem (which already arises in prior work). We
discuss their optimality in light of prior and new lower bounds. We apply our
methods to obtain sharper regret bounds for scale-invariant online prediction
with linear models.Comment: 30 Pages, 1 Figur
Kolmogorov Complexity Theory over the Reals
Kolmogorov Complexity constitutes an integral part of computability theory,
information theory, and computational complexity theory -- in the discrete
setting of bits and Turing machines. Over real numbers, on the other hand, the
BSS-machine (aka real-RAM) has been established as a major model of
computation. This real realm has turned out to exhibit natural counterparts to
many notions and results in classical complexity and recursion theory; although
usually with considerably different proofs. The present work investigates
similarities and differences between discrete and real Kolmogorov Complexity as
introduced by Montana and Pardo (1998)
Adaptive Hedge
Most methods for decision-theoretic online learning are based on the Hedge
algorithm, which takes a parameter called the learning rate. In most previous
analyses the learning rate was carefully tuned to obtain optimal worst-case
performance, leading to suboptimal performance on easy instances, for example
when there exists an action that is significantly better than all others. We
propose a new way of setting the learning rate, which adapts to the difficulty
of the learning problem: in the worst case our procedure still guarantees
optimal performance, but on easy instances it achieves much smaller regret. In
particular, our adaptive method achieves constant regret in a probabilistic
setting, when there exists an action that on average obtains strictly smaller
loss than all other actions. We also provide a simulation study comparing our
approach to existing methods.Comment: This is the full version of the paper with the same name that will
appear in Advances in Neural Information Processing Systems 24 (NIPS 2011),
2012. The two papers are identical, except that this version contains an
extra section of Additional Materia
- …